02:31
2026-05-29
lesswrong.com
ai-safety
Suggestions for improving debate protocols in AI safety
Researchers reviewing AI safety debate protocols found that current "propose-critique-decide" models are vulnerable to gaming, where critic models exploit a "last mover advantage" by withholding key cโฆ